Identifying Source-Language Dialects in Translation

نویسندگان

چکیده

In this paper, we aim to explore the degree which translated texts preserve linguistic features of dialectal varieties. We release a dataset augmented annotations Proceedings European Parliament that cover speaker information, and analyze different classes written English covering native varieties from British Isles. Our analyses discuss discriminatory between reveal words whose usage differs same language. perform classification experiments show automatically distinguishing is possible with high accuracy, even after translation, propose new explainability method based on embedding alignments in order specific differences dialects at level vocabulary.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Translation of Arabic Dialects

Arabic Dialects present many challenges for machine translation, not least of which is the lack of data resources. We use crowdsourcing to cheaply and quickly build LevantineEnglish and Egyptian-English parallel corpora, consisting of 1.1M words and 380k words, respectively. The dialectal sentences are selected from a large corpus of Arabic web text, and translated using Amazon’s Mechanical Tur...

متن کامل

Identifying regional dialects in online social media

Electronic social media offers new opportunities for informal communication in written language, while at the same time, providing new datasets that allow researchers to document dialect variation from records of natural communication among millions of individuals. The unprecedented scale of this data enables the application of quantitative methods to automatically discover the lexical variable...

متن کامل

Facilitating Translation Using Source Language Paraphrase Lattices

For resource-limited language pairs, coverage of the test set by the parallel corpus is an important factor that affects translation quality in two respects: 1) out of vocabulary words; 2) the same information in an input sentence can be expressed in different ways, while current phrase-based SMT systems cannot automatically select an alternative way to transfer the same information. Therefore,...

متن کامل

Identifying Translation Effects in English Natural Language Text

Declaration I declare that this thesis has not been submitted as an exercise for a degree at this or any other university and it is entirely my own work. I agree to deposit this thesis in the Univer-sity's open access institutional repository or allow the library to do so on my behalf, subject to Irish Copyright Legislation and Trinity College Library conditions of use and acknowledgement. Summ...

متن کامل

Identifying dialects with textual and acoustic cues

We describe several systems for identifying short samples of Arabic or SwissGerman dialects, which were prepared for the shared task of the 2017 DSLWorkshop (Zampieri et al., 2017). The Arabic data comprises both text and acoustic files, and our best run combined both. The SwissGerman data is text-only. Coincidently, our best runs achieved a accuracy of nearly 63% on both the Swiss-German and A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics

سال: 2022

ISSN: ['2227-7390']

DOI: https://doi.org/10.3390/math10091431